Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf. (ˆθn θ (P ) under P. For real ˆθ n, (τ n

Subsamlig vs Bootstra Dimitris N. Politis, Joseh P. Romao, Michael Wolf R x, θ P = τ ˆθ θ P Examle: ˆθ = X, τ =, θ = EX = µ P orˆθ = mi X, τ =, θ P = su{x : F x 0} Defie: J P, the istributio of τ ˆθ θ P uer P. For real ˆθ, J x, P P rob P τ ˆθ θ P x Sice P is ukow, θ P is ukow, a J x, P is also ukow. The bootra estimate J x, P by J x, ˆP, where ˆP is a cosistet estimate of P i some sese. For examle, take hatp x = i= X i x the emirical istributio: su x ˆP x P x 0 Similarly estimate αth quatile of J x, P by J x, ˆP : i.e. Estimate J x, P by x, ˆP. J Usually J x, ˆP use Mote Carlo aroximatio: for ˆθ,i = ˆθ X,i,..., X,i. a.s. ca t be exlicitly calculatealthough i some simle case it ca be, J x, ˆP B B i= τ ˆθ,i ˆθ x Whe bootstra worksthe meaig of works, for each x, J x, ˆP J x, P α, ˆP = J J α, P 0 0 Whe shoul Bootstra work? Nee local uiformity i weak covergece:

. Usually J x, P J x, P. 2. Also usually ˆP P a.s. i some sese, say su x ˆP x P x 0. a.s. 3. Suose for each sequece P s.t. P P, say su x P P 0, it is also true that J x, P J x, P, the it must be true that a.s. J x, ˆP J x, P 4. So it es u havig to show for P P, J x, P J x, P, use triagular array formulatio. Case whe it works: samle mea with fiite variace. It is kow that:. su x ˆF x F x 0. a.s. 2. θ ˆF = i= X a.s. i θ F = EX. 3. σ 2 ˆF = i= Xi X 2 a.s. σ 2 F = V arx. 4. Use Lierberg-Feller for the triagular array, alie to the etermiistic sequece of P such that: su x P x P x 0; 2 θ P θ P ; 3 σ 2 P σ 2 P, it ca be show that X θ P N 0, σ 2 uer P. 5. Sice ˆP satisfies,2,3 a.s., therefore a.s. J x, ˆP J x, P. Therefore local uiformity of weak covergece is satisfie here. Cases whe bootstra fails:. Orer Statistics: F U 0, θ, a X,..., X is the orer statistics of the samle, so X is the maximum: P θ X θ The bootstra versio: > x = P X < θ θx = P P X X /X = 0 = X i < θ θx = = θ x e 0.63 θ θx e x 2

2. Degeerate U-statistics: Take w x, y = xy, θ F = w x, y F x F y = µ F 2. If µ F 0 it is kow that The bootstra works. ˆθ = θ ˆF = X i X j i j S x = xyf y = xµ F ˆθ θ N 0, 4V ar S X = N 0, 4 µ 2 EX 2 µ 4 But if µ F = 0 = θ F = 0: θ ˆF = X i X j = X 2 i j θ ˆF θ F = X 2 S 2 N 0, σ 2 σ 2 [ However the bootstra versio of θ ˆF ] θ ˆF : i Xi X 2 = X2 S2 [ X 2 ] [ S 2 X 2 ] S2 2 = X S 2 X 2 + S 2 X 2 X 2 = [ X X ] 2 + 2 X X X N 0, σ 2 2 + 2N 0, σ 2 X Subsamlig: ii case: Y i block of size b from X,..., X, i =,..., q, for q = b. Let ˆθ,b,i = ˆθ Y i calculate with the ith block of ata. Use the emirical istributio of τ b ˆθ,b,i ˆθ over the q seuo-estimates to aroximate the istributio of τ ˆθ θ : Aroximate by J x, P = P τ ˆθ θ x L,b x = q i= τ b ˆθ,b,i ˆθ x 3

Claim: If b, b/ 0, τ b /τ 0, as log as τ ˆθ θ somethig, J x, P L,b x 0 Differet motivatio for Subsamlig vs. Bootstra: Subsamlig: each subset of size b comes from the TRUE moel. Sice τ ˆθ θ J x, P, so as log as b : τ b ˆθb θ J x, P For large, the istributios of τ ˆθ θ a τ b ˆθb θ shoul be close. But Sice τ b ˆθb θ = τ b ˆθb ˆθ + τ b ˆθ θ τb τ b ˆθ θ = O τ = o The istributios of τ b ˆθb θ a τ b ˆθb ˆθ shoul be close. The istributio of τ b ˆθb ˆθ is estimate by the emirical istributio over q = b seuo-estimates. Bootstra: Recalculate the statistics from the ESTIMATED moel ˆP. Give that ˆP is close to P, hoefully J x, ˆP is close to J x, P Or to J x, P, the limit istributio. But whe bootstra fails ˆP P J x, ˆP J x, P Formal Proof of cosistecy of subsamlig: Assumtios: τ ˆθ θ Nee to show: L,b x J x, P 0. Sice τ θ θ J x, P,b, b 0, τ b τ 0. 0, it is eough to show U,b x = q i= τ b ˆθ,b,i θ x J x, P U,b x J x, P = U,b x EU,b x + EU,b x J x, P 4

Eough to show U,b x EU,b x 0 a EU,b x J x, P 0 But EU,b x J x, P = J b x, P 0 U,b x is a bth orer U-statistics with kerel fuctio boue by,. Use Hoeffig exoetial-tye iequalityserflig980, Thm A. 20: P U,b x J b x, P ɛ ex 2 b ɛ2 / [ ] = ex b t2 0 as b So. L,b x J x, P = L,b x U,b x + U,b x J b x, P + J b x, P J x, P Q.E.D. Time Series!: Resect the orerig of the ata to reserve correlatio. ˆθ,b,t = ˆθ b X t,..., X t+b, q = T b +. 0. L,b x = q i= τ b ˆθ,b,t ˆθ x Assumtio: τ ˆθ θ J x, P, b, b 0, τ b τ 0, α m 0. Result: L,b x J x, P 0. Most ifficult art: To show τ ˆθ θ J x, P. Ca treat ii ata as time series, or eve usig o-overlaig blocks k = [ ] b, but usig b more efficiet. For examle, if Ū x = k k j= τ b [R,b,j θ P ] x the U,b x = E [ Ū x X \ ] = E [ τb [R,b,j θ P ] x X ] for X = X,..., X. U,b x is better tha Ū x sice X is sufficiet statistics for ii ata. 5

Hyothesis Testig: T = τ t X,..., X, G x, P = P rob τ x P P 0 J x, P Ĝ,b x = q T,b,i x = q τ b t,b,i x i= i= As log as b, b 0, the uer P P 0: Ĝ,b x G x, P If uer P P, T, the x, Ĝ,b x 0. Key ifferece with cofiece iterval: o t ee τ b τ θ 0 but assume kow uer the ull hyothesis. 0, because o t ee to estimate Estimatig the ukow rate of covergece: Assume that τ = β, for some β > 0, but β is ukow. Estimate β usig ifferet size of subsamlig istributio. Key iea: Comare the shae of the emirical istributios of ˆθ b ˆθ for ifferet values of b to ifer the value of β. Let q = b for ii ata, or q = T b + for time series ata: This imlies L,b x τ b q L,b x q a= τ b ˆθ,b,a ˆθ x a= ˆθ,b,a ˆθ x L,b x τ b = L,b τ b x t Sice L,b x τ b Same as x = L,b t τ b = τ b τ b x = τ b L,b t J x, P, if J x, P is cotiuous a icreasig, it ca be ifere that L,b t τ b = J t, P + o τ b L,b t = J t, P + o 6

So b β L,b t = J t, P + o take logassumig J t, P > 0, or t > J 0, P, for ifferet b a b 2, the this becomes Differet out the fixe effect So estimate β by β log b + log L,b t = log J t, P + o β log b 2 + log L,b 2 t = log J t, P + o β log b log b 2 = log L,b 2 t log L,b t + o ˆβ = log b log b 2 log L,b 2 t log L,b t = β + log b log b 2 o Take b = γ, b 2 = γ 2, γ > γ 2 > 0 ˆβ β = γ γ 2 log o = o log How to kow t > J 0, P So estimatig J 0, P ot a roblem. L,b 0 τ b = L,b 0 = J 0, P + o Alteratively, take t 2 0.5,, take t 0, 0.5 b β L,b t 2 L,b t = J t 2 P J t P + o β log b + log L,b t 2 L,b t = log J t 2 P J t P + o ˆβ = log b log b 2 [ log L,b 2 t 2 L,b 2 t log L,b t 2 L,b t ] Take b = γ, b 2 = γ 2, > γ > γ 2 > 0, As before ˆβ β = o log 7

Two ste subsamlig: ˆτ = ˆβ L,b x ˆτ b = q a= ˆτ b ˆθ,b,a ˆθ x Ca show that su x L,b x ˆτ b J x, P 0. Problem: imrecise i small samles. E.g. i variatio estimatio, best choice of b gives error rate of O /3 but arameter estimates, if moel is true, gives O /2 error rate. Bootstra ivotal statistics, whe alicable, gives eve better tha O /2 error rate. 8